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IDENTIFICATION OF ALLELES 



FIELD OF THE INVENTION 

The invention is related to the area of genome analysis. In particular it is 
related to the field of identification of genotypes. 

5 B A CKGR OUND OF THE I NV E NT I ON 

Obtaining genotype information on thousands of polymorphisms in a 
highly parallel fashion is becoming an increasingly important task in mapping 
disease loci, in identifying quantitative trait loci, in diagnosing tumor loss of 
heterozygosity, and in performing association studies. A currently available 

10 method for simultaneously evaluating large numbers of genetic polymorphisms 

involves hybridization to allele-specific probes on high density oligonucleotide 
arrays. In order to practice that method, redundant sets of hybridization probes, 
typically twenty or more, are used to score each allelic marker. A high degree of 
redundancy is required to reduce noise and achieve an acceptable level of 

15 accuracy. Even this level of redundancy is insufficient to unambiguously score 

heterozygotes or to quantitatively determine allele frequency in a population. 
Because of these limitations, there is a need in the art for more reliable and more 
quantitative methods to perform genomic analysis at polymorphic loci. 

The technique of allele-specific polymerase chain reaction (ASPCR) can be 

20 applied to allele identification and quantitative analysis of allele frequency. 

However, this technique suffers from cross reactivity between amplified products 
when hybridizing to probes which differ by only a single nucleotide base. A 

1 
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partial solution to the cross-reactivity problem has been achieved by the addition 
of sequence tags to the ASPCR primers. The incorporation of tags in ASPCR 
primers can itself interfere with the identification of the amplification products 
because unreacted primers or partially extended products can compete with full 
5 products for hybridization to the probes. Thus, there is a further need in the art 

for methods and materials which permit the use of tags in the analysis of 
polymorphic loci without interference from incompletely reacted products. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide methods and compositions for the 

0 identification of nucleotides at a polymorphic locus in a nucleic acid sequence. 

These and other objects of the invention are provided by one or more of the 
embodiments described below. 

One embodiment of the invention provides a method for determining a 
nucleotide at a polymorphic locus. A region of double stranded DNA comprising 

5 a polymorphic locus is amplified to form an amplified DNA product using a pair 

of primers. A first primer of the pair terminates at its 3' end at the polymorphic 
locus. The first primer also comprises a 3' portion which is complementary to the 
region of double stranded DNA and a 5' portion which comprises the same 
sequence as all or a portion of a probe on a solid support which is not 

:0 complementary to the region of double stranded DNA. The amplified DNA 

products are labeled to form labeled amplified DNA products. The labeled 
amplified DNA products are hybridized to the probe on the solid support. 

Another embodiment of the invention provides a pair of primers which 
specifically amplify an allelic form of a polymorphic locus. A first primer of the 

15 pair comprises a 3' portion which is complementary to a region of DNA 

comprising the polymorphic locus. The first primer also comprises a 5 f portion 
which comprises the same sequence as all or part of a probe on a solid support. 
The sequence is not complementary to the region of DNA comprising the 
polymorphic locus. The first primer terminates in a 3 f nucleotide which is 

(0 complementary to a distinct allelic form of the polymorphic locus. 
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The invention thus provides the art with sensitive and specific methods and 
compositions for identification of polymorphic nucleotides in a DNA sample 
which may be from one or more individuals, 

RRIEF DESCRIPTION OF THE DRAWIN GS 

5 Figure 1 illustrates a method of determining nucleotides at a polymorphic 

locus. The first step shows the use of allele specific polymerase chain reaction 
(ASPCR) primers to amplify only those regions of the double stranded DNA 
sample which contain a specific nucleotide at a polymorphic locus (G in this 
example). In the second step, the amplification product is labeled at each 3 ? end, 
10 for example, by attaching a single labeled ddUTP residue u^ing a terminal 

Q transferase reaction. The third step shown is the hybridization of the labeled 

: J2 amplification product to a solid support ("antisense tag chip") to which a probe is 

% attached that is identical in complementarity to a tag at the 5' end of the ASPCR 

;^ primer. 

; y 15 

^ DETATEED DESCRIPTION OF THE INVENTION 

O It is a discovery of the present inventors that determination of polymorphic 

m loci can be accomplished with great specificity and sensitivity by hybridizing 

% uniquely tagged allele-specific nucleic acid sequences to corresponding tag 

20 probes in an array. A nucleic acid sample is amplified by allele-specific 

amplification so as to uniquely tag nucleic acids corresponding to different alleles 
of a polymorphic locus. Nucleic acids corresponding to different alleles are 
linked to different tags, each of which comprises a sequence that is identical in 
sequence to all or part of a probe on a detection array. The use of a unique tag for 
25 each allele eliminates the problem of cross-hybridization which arises with other 

methods. Furthermore, the use of allele-specific primers with the same 
complementarity as the probes eliminates interference from unreacted primers 
and partially extended products. Such problems can prevent unambiguous 
determination of polymorphic alleles. 



3 
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A diploid organism, for example a human, possesses two copies of each 
type of autosomal gene in its somatic cells. A population of organisms may 
contain several variants of a gene, known as alleles. A "polymorphic locus" is a 
location within a genome which exhibits genetic polymorphism, i.e., a location 

5 where one or more nucleotides may vary in the genomes of different individuals. 

Such variation can arise due to inherited mutations, or they can arise as de novo 
mutations in an individual organism. An "allelic form" is a specific variant of a 
gene embodied in a nucleic acid molecule, e.g., genomic DNA, an RNA 
transcript, a cDNA, a synthetic nucleic acid bearing the sequence of the variant, 

10 or a protein molecule encoded by the variant. Different allelic forms differ from 

one another by single basepair substitutions (also called single nucleotide 
polymorphisms or SNPs), or they can differ by two or more bases. Different 
allelic forms can also arise by insertion or deletion mutations. Any distinct allelic 
form at a polymorphic locus can be identified and quantified with the methods 

15 described here. 

Briefly, three steps can be employed to determine a polymorphic 
nucleotide: allele-specific amplification, labeling, and hybridizing. An optional 
fourth step involves the optical detection of fluorescently labeled, hybridized 
amplification products. Allele-specific amplification of a nucleic acid sample 

20 according to the present invention uses allele-specific primers, each of which has 

an allele-specific 3 1 end and a 5' portion comprising a unique "antisense" tag. The 
amplification products for each allele are labeled and hybridized to a solid 
support comprising probes, such as an array, beads, or microtiter dish. Each 
probe in the array comprises the same or nearly the same sequence as the 

25 antisense tag of an allele-specific primer. 

The genotype of an individual at a polymorphic locus can be determined 
from the hybridization. If the nucleic acid sample being tested is derived from a 
population or group of individual organisms, the ratio of allelic forms in the 
population can be quantified. A plurality of polymorphic loci in a given nucleic 

30 acid sample can be simultaneously analyzed in a single reaction mixture using a 

plurality of pairs of primers. 

4 
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A tag is a nucleotide sequence which is identical or nearly identical to the 
sequence of all or a portion of a respective probe in an array. A tag sequence 
need only be sufficiently identical to its respective probe sequence to permit 
specific binding between the complement of the tag and the probe, without 

5 sufficient binding to other probes to confuse the determination of hybridization. 

However, it is preferred that each base of a tag sequence be identical to each 
corresponding base of the respective probe sequence. Tags and probes need not 
be identical in length. The appropriate length for tags and probes is such that a 
tag's complement and its respective probe hybridize with high affinity and high 

0 specificity. Important factors include, for example, length of the tag and probe 

sequences, the number and position of mismatching bases, if any, and the 
characteristics of the solution in which hybridization is carried out, especially its 
ionic strength and pH. Each probe and its corresponding tag share a region of 
identical sequence that is at least 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35, or 40 

5 nucleotides in length. 

The sequence tags are typically totally unrelated to the sequences of the 
polymorphic alleles which are being analyzed. The sequence tags are chosen for 
their favorable hybridization characteristics. The tags are typically selected so 
that they have similar hybridization characteristics and minimal cross- 

0 hybridization to other tag sequences. Each sequence tag is attached to a specific 

gene or genetic marker, and then serves as a label for that particular gene or 
genetic marker, 

A generic tag array, corresponding to the pre-selected tag sequences can be 
fabricated and used to detect the presence or absence or ratio of specific allelic 

5 forms in a test sample. See U.S. Patent No. 5,800,992, application Serial No. 

08/626,285 filed April 4, 1996, and EP application no. 97302313.8 which are 
expressly incorporated by reference herein. Advantages of the disclosed method 
include that just one generic tag solid support can be used to genotype any 
genetic marker, i.e., no specific customized genotyping solid support is needed. . 

0 In addition, the pre-selected probe sequences synthesized on a tag solid support 

guarantee good hybridization results between the probe and the complement of 
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the tag, with little interference from cross-hybridization by closely related allelic 
sequences. 

The DNA analyzed can be of any source, including genomic, nuclear, 
cDNA, mitochondrial DNA, macronuclear DNA, and micronuclear DNA. The 
DNA can be isolated from one or more individuals. The DNA can be purified to 
contain only a certain subset of cellular DNA if desired. Any type of 
amplification reaction can be used, including PCR, ligase chain reaction, 
transcription amplification, and self-sustained sequence replication. Thus, 
appropriate enzymes such as DNA polymerase or DNA ligase will be used as 
desired by the artisan. 

Each pair of primers specifically amplifies one allelic form of a 
polymorphic locus. Each pair of primers comprises a first primer and a second 
primer. The first and second primers are complementary to opposite strands of 
the DNA region to be amplified. The first primer of the pair terminates in a 3' 
nucleotide which is complementary to a specific allelic form but not 
complementary to other allelic forms. The first primer contains a portion at its 3' 
end which is complementary to the region of double stranded DNA to be 
amplified. The length of the 3' portions of both the first and second primers 
should be sufficient to produce specific binding of the primers to the 
complementary sequences in the region of DNA to be amplified, resulting in 
amplification of that region and little or no detectable amplification of other 
regions. The first primer also contains a portion at its 5* end which comprises a 
tag. The tag has the same sequence as all or a portion of a probe on a solid 
support. The tag sequence is specifically chosen so as not to share a 
complementary region with the region of DNA which is amplified; this prevents 
cross-hybridization of other labeled amplified products with the probe. 
Preferably, no probe on the solid support shares a region of complementary 
sequence with an amplified DNA region greater than 2, 3, 4, 5, 6, 8, or 10 
consecutive bases. The second primer may, but need not contain a tag. It is 
preferred that the primers hybridize to a double stranded nucleic acid in locations 
which are not more than 2 kb apart, and preferably which are much closer 
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together, such as not more than 1 kb, 0.5 kb, 0.2 kb, 0.1 kb, 0.01 kb or 0.001 kb 
apart. A suitable DNA polymerase can be used as is known in the art. 
Thermostable polymerases are particularly convenient for thermal cycling of 
rounds of primer hybridization, polymerization, and melting. Amplification of 
5 single stranded nucleic acids can also be employed. 

The amplified DNA product is labeled prior to its hybridization to a probe 
on a solid support. The method of labeling can be any method known in the art, 
but it should not interfere with hybridization of the amplified DNA product with 
the respective probe sequence. 

1 0 The amplified DNA product can be hybridized to one or more probes 

which are immobilized to known locations on a solid support, e.g., in an array, 
microarray, high density array, beads or microtiter dish. Each probe is of 
identical complementarity as a corresponding tag on a primer. This assures that 
no unreacted primer will interfere or compete with hybridization of the labeled 

1 5 amplification product to the probe. The quantities of the label at distinct 

locations on the solid support can be compared, and the genotype can be 
determined for the one or more individuals or organisms from whom the DNA in 
the sample was obtained. 

Nucleotides at two or more polymorphic loci can be determined 

20 simultaneously. The amplification can be performed in one or more reaction 

vessels. The hybridization can be performed using probes on the same array or 
different arrays. The results can be expressed qualitatively (presence or absence 
of given nucleotides at each polymorphic locus in a DNA sample) or 
quantitatively (ratio of different nucleotides at each polymorphic locus). 

25 A set of primers can be used for determining genotype of an individual or 

allelic frequency in a population. The set comprises two or more pairs of primers 
as described above. Each of the pairs specifically amplifies a different allelic 
form at the polymorphic locus. Two or more pairs of primers can be supplied 
together in a single container as a kit. The container may be subdivided and may 

30 contain multiple vessels. So long as the components are physically attached to 

each other or in a single package they form a kit. Such kits can additionally 
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include a solid support comprising at least two probes, where each probe contains 
a different tag. Instructions for use according to the disclosed method, enzymes 
for amplification, buffers and control samples can be included as components in 
the kit. 

5 Providing a Nucleic Acid Sample 

The terms "nucleic acid" or "nucleic acid molecule" refer to a 
deoxyribonucleotide or ribonucleotide polymer in either single-or double- 
stranded form, and unless otherwise limited, would encompass analogs of a 
natural nucleotide that can function in a similar manner as naturally occurring 

10 nucleotide. Suitable nucleic acid samples can contain polymorphic loci of 

interest. Suitable nucleic acid samples can also contain nucleic acids derived 
from a polymorphic locus of interest. As used herein, a nucleic acid derived from 
a polymorphic locus refers to a nucleic acid for whose synthesis the genomic 
DNA containing the polymorphic locus or a subsequence thereof has ultimately 

15 served as a template. Thus, a DNA amplified from genomic DNA, an RNA 

transcribed from the amplified DNA, an mRNA transcribed from the genomic 
DNA, or a cDNA reverse transcribed from the mRNA, etc., are all derived from 
the polymorphic locus, and detection of such derived products is indicative of the 
presence and/or abundance of the original polymorphic locus in a sample. Thus, 

20 suitable samples include, but are not limited to, isolated genomic DNA 

containing the gene or genes containing the polymorphic locus, an RNA 
transcript derived from the isolated genomic DNA, cDNA reverse transcribed 
from the transcript, cRNA transcribed from the cDNA, DNA amplified from the 
genes, RNA transcribed from amplified DNA, and the like. If the sample is a 

25 non-DNA sample, it can be converted to double stranded DNA prior to 

amplification per the invention, for example using reverse transcriptase and/or 
DNA polymerase. The sample can be derived from a single individual organism, 
e.g., human, animal, plant, or microbial. The sample can alternatively be derived 
from two or more organisms, in which case the determination will reveal 

30 information about allelic frequency within the population from which the nucleic 

acid sample was derived. 
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The nucleic acid sample can be a homogenate of cells or tissues or other 
biological samples. Preferably, the nucleic acid sample is a total DNA 
preparation of a biological sample. More preferably in some embodiments, the 
nucleic acid sample is the total genomic DNA isolated from a biological sample. 
5 The nucleic acid sample can be the total mRNA isolated from a biological 

sample. Those of skill in the art will appreciate that the total mRNA prepared 
with most methods includes not only the mature mRNA, but also the RNA 
processing intermediates and nascent pre-mRNA transcripts. For example, total 
mRNA purified with a poly (dT) column contains RNA molecules with poly (A) 

1 0 tails. Those polyA + RNA molecules could be mature mRNA, RNA processing 

intermediates, nascent transcripts or degradation intermediates. 

Biological samples can be of any biological tissue or fluid or cells from any 
organism. Frequently the sample will be a "clinical sample," which is a sample 
derived from a patient. Clinical samples provide a rich source of information 

1 5 regarding the various alleles of a gene and their relation to disease. Some 

embodiments of the invention can be employed to detect mutations and to 
identify the phenotype of mutations. Such embodiments have extensive 
applications in clinical diagnostics and clinical studies. Typical clinical samples 
include, but are not limited to, sputum, blood, blood cells {e.g., white cells), 

20 tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, or 

cells therefrom. Biological samples can also include sections of tissues, such as 
frozen sections or formalin-fixed sections taken for histological purposes. Cell 
cultures are another typical source of biological samples. Cell cultures used as a 
source of DNA or RNA can be derived from a clinical sample, or can be supplied 

25 from a primary cell culture, a subculture, or a cell line from any organism. 

Allele-Specific Amplification 

The nucleic acid sample is subjected to amplification prior to hybridization 
and detection of an allelic marker. Methods for amplification of a nucleic acid 
are well known in the art. In general, amplification of a nucleic acid sample 

30 employs a pair of single-stranded oligonucleotide primers together with an 

enzyme, e.g., DNA polymerase, which replicates (amplifies) a region of the 

9 
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nucleic acid sample, resulting in multiple copies of the region delimited by the 
sequences that are complementary to the primers. The pair of primers is chosen 
so as to amplify a region of the nucleic acid sample containing the polymorphic 
locus. The size of the region amplified is not critical, but the region must be 

5 sufficiently large to include not only the polymorphic locus but also enough 

sequence on either side of the polymorphic locus to permit highly specific 
binding of the pair of primers to the chosen region. Strategies for designing and 
synthesizing primers suitable for amplification of a specific region of a nucleic 
acid sample are known in the art. 

0 A preferred amplification method is allele-specific amplification. Okayama 

et al, J. Lab. Clin. Med. 1 14:105-1 13 (1989). In allele-specific amplification, a 
nucleotide substitution which is characteristic of a given allele is placed at the 3' 
end of one of the primers. Only that allele which is complementary to the primer 
will be amplified; another allele, which contains a different nucleotide 

5 substitution and is not complementary to the 3' end of the primer, will not be 

amplified. The amplification reaction itself can be carried out according to the 
polymerase chain reaction (PCR) (see PCR Protocols, A Guide to Methods and 
Applications, Innis et al, Academic Press, Inc. N.Y., (1990)) or another suitable 
amplification method. Other suitable amplification methods include, but are not 

0 limited to ligase chain reaction (LCR) (see Wu and Wallace, Genomics, 4: 560 

(1989); Landegren, et al, Science, 241: 1077 (1988); and Barringer, et al, Gene, 
89: 117 (1990)), transcription amplification (Kwoh, et al., Proc. Natl Acad. Set 
USA, 86: 1 173 (1989)), and self-sustained sequence replication (Guatelli, et al, 
Proc. Nat. Acad Set USA, 87: 1874 (1990)). 

5 One of skill in the art will appreciate that whatever amplification method is 

used, if a quantitative result is desired, care must be taken to use a method that 
maintains or controls for the relative frequencies of the amplified nucleic acids to 
achieve quantitative amplification. Methods of quantitative amplification are 
well known to those of skill in the art. For example, quantitative PCR may 

0 involve simultaneously co-amplifying a known quantity of a control sequence 

using the same primers used to amplify the nucleic acids of interest. This 

10 
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provides an internal standard that can be used to calibrate the PCR reaction. The 
high density array can then include probes specific to the internal standard for 
quantification of the amplified nucleic acid. Detailed protocols for quantitative 
PCR are provided in PCR Protocols, A Guide to Methods and Applications, Irmis 

5 et al, Academic Press, Inc. N.Y., (1990). 

labelin g the Amplified DNA Product 

The amplified DNA product is labeled by any method known in the art 
prior to hybridization. For example, a single dideoxyribonucleotide (such as 
ddUTP) can be added to the free 3' ends using the enzyme terminal transferase, 

0 which extends the blunt ends of double stranded DNA molecules. See 

Roychoudhury, R., 1981, The use of terminal transferase for molecular cloning 
and nucleic acid analysis, Gene Amplif. Anal 2:41-83. The label can be 
covalently attached to the dideoxyribonucleotide and can be a fluorescent label 
(e.g., fluorescein, Texas red, rhodamine, green fluorescent protein, and the like) 

5 or other label as defined under "Signal Detection" below. Ordinary 

deoxyribonucleotides bearing a label can also be used, in which case it is likely 
that more than one nucleotide, and perhaps more than one label moiety, will be 
added at the 3' end of the tag sequence, depending on the reaction conditions 
(e.g., time and temperature) chosen. 

0 Hybridizing Nucleic Acids to Arrays of AHele-Specific Probes 

"Hybridization" refers to the formation of a bimolecular complex of two 
different nucleic acids through complementary base pairing. Complementary 
base pairing occurs through non-covalent bonding, usually hydrogen bonding, of 
bases that specifically recognize other bases, as in the bonding of complementary 

5 bases in double-stranded DNA. In this invention, hybridization is carried out 

between a target nucleic acid, which is prepared from the nucleic acid sample by 
allele-specific amplification, and at least one probe which has been immobilized 
on a solid substrate, optionally in an array. 

One of skill in the art will appreciate that an enormous number of array 

0 designs are suitable for the practice of this invention. An array will typically 

include a number of probes that specifically hybridize to the sequences 

11 
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(complements of tags) of interest. It is preferred that an array include one or 
more control probes. In one embodiment, the array is a high density array. A 
high density array is an array used to hybridize with a target nucleic acid sample 
to detect the presence of a large number of allelic markers, preferably more than 
5 10, more preferably more than 100, and most preferably more than 1000 allelic 

markers. 

High density arrays are suitable for quantifying small variations in the 
frequency of an allelic marker in the presence of a large population of 
heterogeneous nucleic acids. Such high density arrays can be fabricated either by 

1 0 de novo synthesis on a substrate or by spotting or transporting nucleic acid 

sequences onto specific locations of a substrate. Both of these methods produce 
nucleic acids which are immobilized on the array at particular locations. Nucleic 
acids can be purified and/or isolated from biological materials, such as a bacterial 
plasmid containing a cloned segment of a sequence of interest. Suitable nucleic 

1 5 acids can also be produced by amplification of templates or by synthesis. As a 

nonlimiting illustration, polymerase chain reaction, and/or in vitro transcription 
are suitable nucleic acid amplification methods. 
Probe design 

The "probes" used here are specially designed to hybridize to the 
20 complement of a corresponding "tag". Both the probe and tag sequences are 

specially chosen, typically artificial oligonucleotide sequences that are unrelated 
to the rest of the target nucleic acid sequence (the amplified region containing the 
polymorphism). 

Various types of probes can be included in an array. An array includes "test 
25 probes." Test probes can be oligonucleotides that range from about 5 to about 45 

or 5 to about 500 nucleotides, more preferably from about 10 to about 40 
nucleotides and most preferably from about 1 5 to about 40 nucleotides in length. 
In particularly preferred embodiments the probes are 20 to 25 nucleotides in 
length. In another embodiment, test probes are double or single stranded DNA 
30 sequences. DNA sequences can be isolated or cloned from natural sources or 

amplified from natural sources using natural nucleic acids as templates. 

12 
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However, in situ synthesis of probes on the arrays is preferred. The probes have 
sequences identical to particular tag sequences of the amplified DNA product 
which they are designed to detect. 

The term "perfect match probe" refers to a probe which has a sequence that 
is perfectly complementary to a particular target sequence. The probe is typically 
perfectly complementary to a portion (subsequence) of the target sequence. The 
perfect match probe can be a "test probe/' a "normalization control probe," an 
expression level control probe and the like. A perfect match control or perfect 
match probe is, however, distinguished from a "mismatch control" or "mismatch 
probe" or "mismatch control probe." 

In addition to test probes that bind the complements of tags of interest, the 
solid support can contain a number of control probes. The control probes fall into 
two categories: normalization controls and mismatch controls. 

Normalization controls are oligonucleotide or other nucleic acid probes that 
are complementary to labeled reference oligonucleotides or other nucleic acid 
sequences that are added to the nucleic acid sample. The signals obtained from 
the normalization controls after hybridization provide a control for variations in 
hybridization conditions, label intensity, "reading" efficiency, and other factors 
that may cause the signal of a perfect hybridization to vary between arrays. In a 
preferred embodiment, signals {e.g., fluorescence intensity) read from all other 
probes in the array are divided by the signal {e.g., fluorescence intensity) from the 
control probes, thereby normalizing the measurements. 

Virtually any probe can serve as a normalization control. However, it is 
recognized that hybridization efficiency varies with base composition and probe 
length. Preferred normalization probes are selected to reflect the average length 
of the other probes present in the array; however, they can be selected to cover a 
range of lengths. The normalization control(s) can also be selected to reflect the 
(average) base composition of the other probes in the array; however in a 
preferred embodiment, only one or a few normalization probes are used and they 
are selected such that they hybridize well {i.e. no secondary structure) and do not 
match any target-specific probes. 

13 
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Mismatch controls can also be provided for the probes to the complements 
of tags or for normalization controls. The terms "mismatch control" or 
"mismatch probe" or "mismatch control probe" refer to a probe whose sequence 
is deliberately selected not to be identical to a particular tag. Mismatch controls 
5 are oligonucleotide probes or other nucleic acid probes identical to their 

corresponding test or control probes except for the presence of one or more 
mismatched bases. A mismatched base is a base selected so that it is not 
complementary to the corresponding base in the complement of a tag to which the 
probe would otherwise specifically hybridize. One or more mismatches are 

10 selected such that under appropriate hybridization conditions (e.g., stringent 

conditions) the test or control probe would be expected to hybridize with its target 
sequence, but the mismatch probe would not hybridize (or would hybridize to a 
significantly lesser extent). Preferred mismatch probes contain a central 
mismatch. Thus, for example, where a probe is a 20 mer, a corresponding 

15 mismatch probe will have the identical sequence except for a single base 

mismatch (e.g., substituting a G, a C, or a T for an A) at any of positions 6 
through 14 (the central mismatch). 

For each mismatch control in a high-density array there typically exists a 
corresponding perfect match probe that is perfectly identical to the same 

20 particular tag. The mismatch may comprise one or more bases. While the 

mismatch(s) may be located anywhere in the mismatch probe, terminal 
mismatches are less desirable, as a terminal mismatch is less likely to prevent 
hybridization of the tag. In a particularly preferred embodiment, the mismatch is 
located at or near the center of the probe such that the mismatch is most likely to 

25 destabilize the duplex with the complement of the tag under the test hybridization 

conditions. 

Mismatch probes provide a control for non-specific binding or cross- 
hybridization to a nucleic acid in the sample other than the complement of the tag 
to which the probe is directed. Mismatch probes thus indicate whether or not a 
30 hybridization is specific. For example, if the complement of the tag is present, 

the perfect match probes should be consistently brighter than the mismatch 
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probes. The difference in intensity between the perfect match and the mismatch 
probe (I (pm) -I(mm)) provides a good measure of the concentration of the hybridized 
material. 

The solid support can also include sample preparation/amplification control 

5 probes. These are probes that are complementary to subsequences of control 

genes selected because they do not normally occur in the nucleic acids of the 
particular biological sample being assayed. Suitable sample 
preparation/amplification control probes include, for example, probes to bacterial 
genes (e.g., Bio B) where the sample in question is from a eukaryote. 

0 In a preferred embodiment, oligonucleotide probes on the solid support are 

selected to bind specifically to the complements of tags to which they are directed 
with minimal non-specific binding or cross-hybridization under the particular 
hybridization conditions utilized. Because high density arrays can contain in 
excess of 1,000,000 different probes, it is possible to provide every probe of a 

5 characteristic length that binds to a particular nucleic acid sequence. Thus, for 

example, the high density array can contain every possible 20-mer sequence 
complementary to an IL-2 mRNA. However, there may exist 20-mer 
subsequences that are not unique to the IL-2 mRNA. Probes directed to these 
subsequences are expected to cross-hybridize with occurrences of their 

0 complementary sequence in other regions of the sample genome. Similarly, other 

probes simply may not hybridize effectively under the hybridization conditions 
(e.g., due to secondary structure, or interactions with the substrate or other 
probes). Thus, in a preferred embodiment, the probes that show such poor 
specificity or hybridization efficiency are identified and excluded either in the 

5 high density array itself (e.g., during fabrication of the array) or in the post- 

hybridization data analysis. 
Forming High Density Arrays 

High density arrays are particularly useful for monitoring the presence of 
allelic markers. The fabrication and application of high density arrays in gene 

0 expression monitoring have been disclosed previously in, for example, WO 

97/10365, WO 92/10588, U.S. Application Ser. No. 08/772,376 filed December 
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23, 1996; serial number 08/529,115 filed on September 15, 1995; serial number 
08/168,904 filed December 15, 1993; serial number 07/624,1 14 filed on 
December 6, 1990, serial number 07/362,901 filed June 7, 1990, all incorporated 
herein for all purposes by reference. In some embodiments using high density 
5 arrays, high density oligonucleotide arrays are synthesized using methods such 

as the Very Large Scale Immobilized Polymer Synthesis (VLSIPS) disclosed in 
U.S. Pat. No. 5,445,934 incorporated herein for all purposes by reference. Each 
oligonucleotide occupies a known location on a substrate. A nucleic acid target 
sample is hybridized with a high density array of oligonucleotides and then the 

1 0 amount of target nucleic acids hybridized to each probe in the array is quantified. 

Synthesized oligonucleotide arrays are particularly preferred for this 
invention. Oligonucleotide arrays have numerous advantages over other 
methods, such as efficiency of production, reduced intra- and inter array 
variability, increased information content, and high signal-to-noise ratio. 

1 5 Preferred high density arrays comprise greater than about 100, preferably 

greater than about 1000, more preferably greater than about 16,000, and most 
preferably greater than 65,000 or 250,000 or even greater than about 1,000,000 
different oligonucleotide probes, preferably in less than 1 cm 2 of surface area. 
The oligonucleotide probes range from about 5 to about 50 or about 500 

20 nucleotides, more preferably from about 10 to about 40 nucleotides, and most 

preferably from about 15 to about 40 nucleotides in length. 

Methods of forming high density arrays of oligonucleotides, peptides and 
other polymer sequences with a minimal number of synthetic steps are known. 
The oligonucleotide analogue array can be synthesized on a solid substrate by a 

25 variety of methods, including, but not limited to, light-directed chemical coupling 

and mechanically directed coupling. See Pirrung et aL, U.S. Patent No. 
5,143,854 (see also PCT Application No. WO 90/15070) and Fodor et aL, PCT 
Publication Nos. WO 92/10092 and WO 93/09668 and U.S. Ser. No. 07/980,523, 
which disclose methods of forming vast arrays of peptides, oligonucleotides and 

30 other molecules using, for example, light-directed synthesis techniques. See also, 

Fodor et aL, Science, 251, 767-77 (1991). These procedures for synthesis of 
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polymer arrays are now referred to as VLSIPS™ procedures. Using the 
VLSIPS™ approach, one heterogeneous array of polymers is converted, through 
simultaneous coupling at a number of reaction sites, into a different 
heterogeneous array. See, U.S. Application Serial Nos. 07/796,243 and 
5 07/980,523. 

The development of VLSIPS™ technology as described in the above-noted 
U.S. Patent No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 
92/10092, is considered pioneering technology in the fields of combinatorial 
synthesis and screening of combinatorial libraries. More recently, patent 

1 0 application Serial No. 08/082,937, filed June 25, 1993, describes methods for 

making arrays of oligonucleotide probes that can be used to check or determine a 
partial or complete sequence of a target nucleic acid and to detect the presence of 
a nucleic acid containing a specific oligonucleotide sequence. 

In brief, the light-directed combinatorial synthesis of oligonucleotide arrays 

15 on a glass surface proceeds using automated phosphoramidite chemistry and chip 

masking techniques. In one specific implementation, a glass surface is 
derivatized with a silane reagent containing a functional group, e.g., a hydroxyl or 
amine group blocked by a photolabile protecting group. Photolysis through a 
photolithogaphic mask is used selectively to expose functional groups which are 

20 then ready to react with incoming 5'-photoprotected nucleoside 

phosphoramidites. The phosphoramidites react only with those sites which are 
illuminated (and thus exposed by removal of the photolabile blocking group). 
Thus, the phosphoramidites only add to those areas selectively exposed from the 
preceding step. These steps are repeated until the desired array of sequences have 

25 been synthesized on the solid surface. Combinatorial synthesis of different 

oligonucleotide analogues at different locations on the array is determined by the 
pattern of illumination during synthesis and the order of addition of coupling 
reagents. 

In the event that an oligonucleotide analogue with a polyamide backbone is 
30 used in the VLSIPS™ procedure, it is generally inappropriate to use 

phosphoramidite chemistry to perform the synthetic steps, since the monomers do 
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not attach to one another via a phosphate linkage. Instead, peptide synthetic 
methods are substituted. See, e.g., Pirrung et al U.S. Pat. No, 5,143,854. 
Peptide nucleic acids are commercially available from, e.g., Biosearch, Inc. 
(Bedford, MA) which comprise a polyamide backbone and the bases found in 
naturally occurring nucleosides. Peptide nucleic acids are capable of binding to 
nucleic acids with high specificity, and are considered "oligonucleotide 
analogues" for purposes of this disclosure. 

Additional methods which can be used to generate an array of 
oligonucleotides on a single substrate are described in co-pending Applications 
Ser. No. 07/980,523, filed November 20, 1992, and 07/796,243, filed November 
22, 1991 and in PCT Publication No. WO 93/09668. In the methods disclosed in 
these applications, reagents are delivered to the substrate by either (1) flowing 
within a channel defined on predefined regions or (2) "spotting" on predefined 
regions or (3) through the use of photoresist. However, other approaches, as well 
as combinations of spotting and flowing, can be employed. In each instance, 
certain activated regions of the substrate are mechanically separated from other 
regions when the monomer solutions are delivered to the various reaction sites. 

A typical "flow channel" method applied to the compounds and libraries of 
the present invention can generally be described as follows. Diverse polymer 
sequences are synthesized at selected regions of a substrate or solid support by 
forming flow channels on a surface of the substrate through which appropriate 
reagents flow or in which appropriate reagents are placed. For example, assume a 
monomer "A" is to be bound to the substrate in a first group of selected regions. 
If necessary, all or part of the surface of the substrate in all or a part of 
the selected regions is activated for binding by, for example, flowing appropriate 
reagents through all or some of the channels, or by washing the entire substrate 
with appropriate reagents. After placement of a channel block on the surface of 
the substrate, a reagent having the monomer A flows through or is placed in all or 
some of the channel(s). The channels provide fluid contact to the first selected 
regions, thereby binding the monomer A on the substrate directly or indirectly 
(via a spacer) in the first selected regions. 
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Thereafter, a monomer "B" is coupled to second selected regions, some of 
which can be included among the first selected regions. The second selected 
regions will be in fluid contact with a second flow channel(s) through translation, 
rotation, or replacement of the channel block on the surface of the substrate; 

5 through opening or closing a selected valve; or through deposition of a layer of 

chemical or photoresist. If necessary, a step is performed for activating at least 
the second regions. Thereafter, the monomer B is flowed through or placed in the 
second flow channel(s), binding monomer B at the second selected locations. In 
this particular example, the resulting sequences bound to the substrate at this 

0 stage of processing will be, for example, A, B, and AB. The process is repeated 

to form a vast array of sequences of desired length at known locations on the 
substrate. 

After the substrate is activated, monomer A can be flowed through some of 
the channels, monomer B can be flowed through other channels, a monomer C 

5 can be flowed through still other channels, etc. In this manner, many or all of the 

reaction regions are reacted with a monomer before the channel block must be 
moved or the substrate must be washed and/or reactivated. By making use of 
many or all of the available reaction regions simultaneously, the number of 
washing and activation steps can be minimized. 

0 One of skill in the art will recognize that there are alternative methods of 

forming channels or otherwise protecting a portion of the surface of the substrate. 
For example, according to some embodiments, a protective coating such as a 
hydrophilic or hydrophobic coating (depending upon the nature of the solvent) is 
utilized over portions of the substrate to be protected, sometimes in combination 

:5 with materials that facilitate wetting by the reactant solution in other regions. In 

this manner, the flowing solutions are further prevented from passing outside of 
their designated flow paths. 

High density nucleic acid arrays can be fabricated by depositing 
presynthezied or natural nucleic acids in predetermined positions. Synthesized 

0 or natural nucleic acids are deposited on specific locations of a substrate by light 

directed targeting and oligonucleotide directed targeting. Nucleic acids can also 
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be directed to specific locations in much the same manner as the flow channel 
methods. For example, a nucleic acid A can be delivered to and coupled with a 
first group of reaction regions which have been appropriately activated. 
Thereafter, a nucleic acid B can be delivered to and reacted with a second group 
5 of activated reaction regions. Nucleic acids are deposited in selected regions. 

Another embodiment uses a dispenser that moves from region to region to deposit 
nucleic acids in specific spots. Typical dispensers include a micropipette or 
capillary pin to deliver nucleic acid to the substrate and a robotic system to 
control the position of the micropipette with respect to the substrate. In other 

1 0 embodiments, the dispenser includes a series of tubes, a manifold, an array of 

pipettes or capillary pins, or the like so that various reagents can be delivered to 
the reaction regions simultaneously. 
Hybridization Conditions 

The term "stringent conditions" refers to conditions under which a probe 

1 5 will hybridize to the complement of its tag subsequence, with only insubstantial 

hybridization to other sequences or to other sequences such that the difference 
may be identified. Stringent conditions are sequence-dependent and will be 
different in different circumstances. Longer sequences hybridize specifically at 
higher temperatures. Generally, stringent conditions are selected to be about 5°C 

20 lower than the thermal melting point (T m ) for the specific sequence at a defined 

ionic strength and pH. 

The T m is the temperature, under defined ionic strength, pH, and nucleic 
acid concentration, at which 50% of the probes complementary to the target 
sequence hybridize to the target sequence at equilibrium. As the target sequences 

25 are generally present in excess, at T m , 50% of the probes are occupied at 

equilibrium). Typically, stringent conditions will be those in which the salt 
concentration is at least about 0.01 to 1.0 M concentration of a Na or other salt at 
pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 
to 50 nucleotides). Stringent conditions can also be achieved with the addition of 

30 destabilizing agents such as formamide. 

The phrase "hybridizing specifically to" refers to the binding, duplexing, or 
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hybridizing of a molecule substantially to or only to a particular nucleotide 
sequence or sequences under stringent conditions when that sequence is present 
in a complex mixture {e.g., total cellular) of DNA or RNA. It is generally 
recognized that nucleic acids are denatured by increasing the temperature or 
5 decreasing the salt concentration of the buffer containing the nucleic acids. 

Under low stringency conditions (e.g., low temperature and/or high salt) hybrid 
duplexes (e.g., DNA:DNA, RNA:RNA, or RNA:DNA) will form even where the 
annealed sequences are not perfectly complementary. Thus, specificity of 
hybridization is reduced at lower stringency. Conversely, at higher stringency 
1 0 (e.g. , higher temperature or lower salt) successful hybridization requires fewer 

mismatches. 

One of skill in the art will appreciate that hybridization conditions can be 
selected to provide any degree of stringency. In a preferred embodiment, 
hybridization is performed at low stringency, in this case in 6X SSPE-T at 37°C 

15 (0.005% Triton X-100), to ensure hybridization, and then subsequent washes are 

performed at higher stringency (e.g., 1 X SSPE-T at 37°C) to eliminate 
mismatched hybrid duplexes. Successive washes can be performed at 
increasingly higher stringency (e.g., down to as low as 0.25 X SSPE-T at 37°C to 
50°C) until a desired level of hybridization specificity is obtained. Stringency can 

20 also be increased by addition of agents such as formamide. Hybridization 

specificity can be evaluated by comparison of hybridization to the test probes 
with hybridization to the various controls that can be present (e.g., expression 
level control, normalization control, mismatch controls, etc.). 

In general, there is a tradeoff between hybridization specificity (stringency) 

25 and signal intensity. Thus, in a preferred embodiment, the wash is performed at 

the highest stringency that produces consistent results and that provides a signal 
intensity greater than approximately 10% of the background intensity. Thus, in a 
preferred embodiment, the hybridized array can be washed at successively higher 
stringency solutions and read between each wash. Analysis of the data sets thus 

30 produced will reveal a wash stringency above which the hybridization pattern is 

not appreciably altered and which provides adequate signal for the particular 
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oligonucleotide probes of interest. 

The stability of duplexes formed between RNAs or DNAs are generally in 
the order of RNA:RNA > RNA:DNA > DNA:DNA, in solution. Long probes 
have better duplex stability with a target, but poorer mismatch discrimination 
5 than shorter probes (mismatch discrimination refers to the measured hybridization 

signal ratio between a perfect match probe and a single base mismatch probe). 
Shorter probes (e.g., 8-mers) discriminate mismatches very well, but the overall 
duplex stability is low. 

Altering the thermal stability (T m ) of the duplex formed between the target 

1 0 and the probe using, e.g., known oligonucleotide analogues allows for 

optimization of duplex stability and mismatch discrimination. One useful aspect 
of altering the T m arises from the fact that adenine-thymine (A-T) duplexes have 
a lower T m than guanine-cytosine (G-C) duplexes, due in part to the fact that the 
A-T duplexes have two hydrogen bonds per base-pair, while the G-C duplexes 

1 5 have three hydrogen bonds per base pair. In heterogeneous oligonucleotide 

arrays in which there is a non-uniform distribution of bases, it is not generally 
possible to optimize hybridization for each oligonucleotide probe simultaneously. 
Thus, in some embodiments, it is desirable to selectively destabilize G-C 
duplexes and/or to increase the stability of A-T duplexes. This can be 

20 accomplished, e.g., by substituting guanine residues in the probes of an array 

which form G-C duplexes with hypoxanthine, or by substituting adenine residues 
in probes which form A-T duplexes with 2,6-diaminopurine or by using 
tetramethyl ammonium chloride (TMAC1) in place of NaCl. 

Altered duplex stability conferred by using oligonucleotide analogue probes 

25 can be ascertained by following, e.g., fluorescence signal intensity of 

oligonucleotide analogue arrays hybridized with a target oligonucleotide over 
time. The data allow optimization of specific hybridization conditions at, e.g., 
room temperature. 

Another way of verifying altered duplex stability is by following the signal 
30 intensity generated upon hybridization with time. Previous experiments using 

DNA targets and DNA chips have shown that signal intensity increases with time, 
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and that the more stable duplexes generate higher signal intensities faster than 
less stable duplexes. The signals reach a plateau or "saturate" after a certain 
amount of time due to all of the binding sites becoming occupied. These data 
allow for optimization of hybridization, and determination of the best conditions 
5 at a specified temperature. 

Methods of optimizing hybridization conditions are well known to those of 
skill in the art (see, e.g., Laboratory Techniques in Biochemistry and Molecular 
Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. 
Elsevier, N.Y., (1993)). 

10 Signal Detection 

The hybridized nucleic acids can be detected by detecting one or more 
labels attached to the target nucleic acids. The labels can be incorporated by any 
of a number of means well known to those of skill in the art. However, in a 
preferred embodiment, the label is incorporated by labeling the amplified DNA 

15 product by carrying out a terminal transferase reaction using a fluorescently 

labeled nucleotide. 

Detectable labels suitable for use in the present invention include any 
composition detectable by spectroscopic, photochemical, biochemical, 
immunochemical, electrical, optical, or chemical means. Useful labels in the 

20 present invention include high affinity binding partners such as biotin for staining 

with labeled streptavidin conjugate, magnetic beads (e.g., Dynabeads™), 
fluorescent dyes (e.g. f fluorescein, Texas red, rhodamine, green fluorescent 
protein, and the like), radiolabels (e.g., 3 H, 12 5I, 3 5S, l 4C, or 3 2P), enzymes 
(e.g., horseradish peroxidase, alkaline phosphatase and others commonly used in 

25 an ELISA), epitopic labels, such as tags, and colorimetric labels such as colloidal 

gold or colored glass or plastic (e.g. , polystyrene, polypropylene, latex, etc.) 
beads. Patents teaching the use of such labels include U.S. Patent Nos. 
3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149; and 
4,366,241. 

30 Means of detecting such labels are well known to those of skill in the art. 

Thus, for example, radiolabels can be detected using photographic film or 
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scintillation counters, fluorescent markers can be detected using a photodetector 
to detect emitted light. Enzymatic labels are typically detected by providing the 
enzyme with a substrate and detecting the reaction product produced by the action 
of the enzyme on the substrate, and colorimetric labels are detected by simply 
5 visualizing the colored label. One method uses colloidal gold label that can be 

detected by measuring scattered light. 

The label can be added to the amplification products prior to, or after the 
hybridization. So called "direct labels" are detectable labels that are directly 
attached to or incorporated into the tagged nucleic acids prior to hybridization. In 

1 0 contrast, so called "indirect labels" are joined to the hybrid duplex after 

hybridization. Often, the indirect label is attached to a binding moiety that has 
been attached to the amplified nucleic acid prior to the hybridization. Thus, for 
example, the amplified nucleic acid can be biotinylated before the hybridization. 
After hybridization, an avidin-conjugated fluorophore will bind the biotin-bearing 

15 hybrid duplexes, providing a label that is easily detected. For a detailed review of 

methods of labeling nucleic acids and detecting labeled hybridized nucleic acids 
see Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 24: 
Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)). 
Means of detecting labeled nucleic acids hybridized to the probes of the 

20 array are known to those of skill in the art. Thus, for example, where a 

colorimetric label is used, simple visualization of the label is sufficient. Where a 
radioactive labeled probe is used, detection of the radiation {e.g. with 
photographic film or a solid state detector) is sufficient. 

Detection of target nucleic acids which are labeled with a fluorescent label 

25 can be accomplished with fluorescence microscopy. The hybridized array can be 

excited with a light source at the excitation wavelength of the particular 
fluorescent label and the resulting fluorescence at the emission wavelength is 
detected. The excitation light source can be a laser appropriate for the excitation 
of the fluorescent label. 

30 The confocal microscope can be automated with a computer-controlled 

stage to automatically scan the entire high density array, i.e., to sequentially 
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examine individual probes or adjacent groups of probes in a systematic manner 
until all probes have been examined. Similarly, the microscope can be equipped 
with a phototransducer (e.g., a photomultiplier, a solid state array, a CCD camera, 
etc.) attached to an automated data acquisition system to automatically record the 
fluorescence signal produced by hybridization to each oligonucleotide probe on 
the array. Such automated systems are described at length in U.S. Patent No: 
5,143,854, PCT Application 20 92/10092, and copending U.S. Application Ser. 
No. 08/195,889, filed on February 10, 1994. Use of laser illumination in 
conjunction with automated confocal microscopy for signal detection permits 
detection at a resolution of better than about 100 |um, more preferably better than 
about 50 [im, and most preferably better than about 25 jam. 

Two different fluorescent labels can be used in order to distinguish two 
alleles at each polymorphic locus examined. In such a case, the array can be 
scanned two times. During the first scan, the excitation and emission 
wavelengths are set as required to detect one of the two fluorescent labels. For 
the second scan, the excitation and emission wavelengths are set as required to 
detect the second fluorescent label. When the results from both scans are 
compared, the genotype identification or allele frequency can be determined. 
Quantification and Determination of Genotyp es 

The term "quantifying" when used in the context of quantifying 
hybridization of a nucleic acid sequence or subsequence can refer to absolute or 
to relative quantification. Absolute quantification can be accomplished by 
inclusion of known concentration(s) of one or more target nucleic acids (e.g., 
control nucleic acids such as Bio B, or known amounts the target nucleic acids 
themselves) and referencing the hybridization intensity of unknowns with the 
known target nucleic acids (e.g., through generation of a standard curve). 
Alternatively, relative quantification can be accomplished by comparison of 
hybridization signals between two or more genes, or between two or more 
treatments to quantify the changes in hybridization intensity and, by implication, 
the frequency of an allele. Relative quantification can also be used to merely 
detect the presence or absence of an allele in the target nucleic acids. In one 
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embodiment, for example, the presence or absence of an allelic form of a 
polymorphic locus can be determined by measuring the quantity of the labeled 
tag at the known location in the array, i.e., on the solid support, of the 
corresponding probe. 

5 A preferred quantifying method is to use a confocal microscope and 

fluorescent labels. The GeneChip® system (Affymetrix, Santa Clara, CA) is 
particularly suitable for quantifying the hybridization; however, it will be 
apparent to those of skill in the art that any similar system or other effectively 
equivalent detection method can also be used, 

1 0 Methods for evaluating the hybridization results vary with the nature of the 

specific probes used, as well as the controls. Simple quantification of the 
fluorescence intensity for each probe can be determined. This can be 
accomplished simply by measuring signal strength at each location (representing 
a different probe) on the high density array (e.g., where the label is a fluorescent 

15 label, detection of the fluorescence intensity produced by a fixed excitation 

illumination at each location on the array). 

One of skill in the art, however, will appreciate that hybridization signals 
will vary in strength with efficiency of hybridization, the amount of label on the 
sample nucleic acid and the amount of the particular nucleic acid in the sample. 

20 Typically nucleic acids present at very low levels (e.g. , < 1 pM) will show a very 

weak signal. At some low level of concentration, the signal becomes virtually 
indistinguishable from background. In evaluating the hybridization data, a 
threshold intensity value can be selected below which a signal is counted as being 
essentially indistinguishable from background. 

25 The terms "background" or "background signal intensity" refer to 

hybridization signals resulting from non-specific binding, or other interactions, 
between the labeled target nucleic acids and components of the oligonucleotide 
array (e.g., the oligonucleotide probes, control probes, the array substrate, etc.). 
Background signals may also be produced by intrinsic fluorescence of the array 

30 components themselves. A single background signal can be calculated for the 

entire array, or a different background signal may be calculated for each target 
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nucleic acid. In a preferred embodiment, background is calculated as the average 
hybridization signal intensity for the lowest 5% to 10% of the probes in the array, 
or, where a different background signal is calculated for each target allele, for the 
lowest 5% to 10% of the probes for each allele. However, where the probes to a 
5 particular allele hybridize well and thus appear to be specifically binding to a 

target sequence, they should not be used in a background signal calculation. 
Alternatively, background may be calculated as the average hybridization signal 
intensity produced by hybridization to probes that are not complementary to any 
sequence found in the sample {e.g., probes directed to nucleic acids of the 

1 0 opposite sense or to genes not found in the sample, such as bacterial genes where 

the sample is mammalian nucleic acids). Background can also be calculated as 
the average signal intensity produced by regions of the array that lack any probes 
at all In a preferred embodiment, background signal is reduced by the use of a 
detergent {e.g., C-TAB) or a blocking reagent {e.g., sperm DNA, cot-1 DNA, 

15 etc.) during the hybridization to reduce non-specific binding. In a particularly 

preferred embodiment, the hybridization is performed in the presence of about 0.5 
mg/ml DNA {e.g., herring sperm DNA). The use of blocking agents in 
hybridization is well known to those of skill in the art {see, e.g., Chapter 8 in P. 
Tijssen, supra). 

20 The high density array can include mismatch controls. In a preferred 

embodiment, there is a mismatch control having a central mismatch for every 
probe in the array, except the normalization controls. It is expected that after 
washing in stringent conditions, where a perfect match would be expected to 
hybridize to the probe, but not to the mismatch, the signal from the mismatch 

25 controls should only reflect non-specific binding or the presence in the sample of 

a nucleic acid that hybridizes with the mismatch. Where both the probe in 
question and its corresponding mismatch control show high signals, or the 
mismatch shows a higher signal than its corresponding test probe, there is a 
problem with the hybridization and the signal from those probes is ignored. For a 

30 given polymorphic locus, the difference in hybridization signal intensity (I allelel - 

I allele2 ) between an allele-specific probe (perfect match probe) for a first allele and 

27 



H:\USR\KAGAN\WORK\PATENT\80923 . APP 

the corresponding probe for a second allele or an average of several other alleles 
(or other mismatch control probe) is a measure of the presence of or 
concentration of the first allele. Thus, in a preferred embodiment, the signal of 
the mismatch probe is subtracted from the signal for its corresponding test probe 
5 to provide a measure of the signal due to specific binding of the test probe. 

The concentration of a particular sequence can then be determined by 
measuring the signal intensity of each of the probes that bind specifically to that 
gene and normalizing to the normalization controls. Where the signal from the 
probes is greater than the mismatch, the mismatch is subtracted. Where the 

10 mismatch intensity is equal to or greater than its corresponding test probe, the 

signal is ignored (i.e., the signal cannot be evaluated). 

For each polymorphic locus analyzed, the genotype can be unambiguously 
determined by comparing the hybridization patterns obtained to the known 
locations of the allele-specific probes. When analyzing a DNA sample from a 

15 single individual, significant detection of hybridization to a probe indicates the 

presence of the corresponding allelic form in the genome of the individual. 
Marginal detection of hybridization, indicated by an intermediate positive result 
(e.g., less than 1%, or from 1-5%, or from 1-10%, or from 2-10%, or from 5-10%, 
or from 1-20%, or from 2-20%, or from 5-20%, or from 10-20% of the average of 

20 all positive hybridization results obtained for the entire array) may indicate either 

cross-hybridization or cross-amplification. 

Further procedures for data analysis are disclosed in U.S. Application 
08/772,376, previously incorporated for all purposes. 
D etermin ati on of Allele Frequenc y 

25 The "allele frequency" is the frequency with which a selected allelic form 

of a gene exists within a population or selected group of organisms. Allele 
frequency is determined from the relative intensity of hybridization to probes. 
The frequency of a selected allelic form can be quantified as the detected number 
of copies of the selected allele divided by the total number of alleles of the gene 

30 possessed by the individuals tested. Statistical methods are available to 

determine whether the number of individuals tested is representative of a given 

28 



H:\USR\KAGAN\WORK\PATENT\80923.APP 

population. The ratio of different allelic forms in a population can also be 
determined using the methods described above. For example, if the DNA sample 
analyzed contains a mixture of DNA from a population of individuals, then the 
ratio of different allelic forms in the population is measured directly as the ratio 
5 of the relative intensities of the label which hybridizes to the probes 

corresponding to those allelic forms. 

It is understood that the examples and embodiments described herein are 
for illustrative purposes only and that various modifications or changes will be 
suggested to persons skilled in the art and are to be included within the spirit and 
10 purview of this application and scope of the appended claims. All publications, 

patents, and patent applications cited herein are hereby incorporated by reference 
for all purposes. 
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WE CLAIM: 

1 . A method to determine a nucleotide at a polymorphic locus in a nucleic 
acid sample, comprising the steps of: 

amplifying a region of DNA comprising a polymorphic locus in the sample 
to form amplified DNA products using a primer which terminates at its 3' end at 
the polymorphic locus, wherein the primer comprises a 3' portion which is 
complementary to the region of DNA and a 5' portion which is identical in 
sequence to all or part of a probe on a solid support and not complementary to the 
region of DNA; 

labeling the amplified DNA products to form labeled amplified DNA 
products; 

hybridizing the labeled amplified DNA products to the probe on the solid 
support. 

2. The method of claim 1 wherein the step of labeling couples a labeled 
nucleotide to a 3' end. 

3. The method of claim 1 wherein terminal transferase catalyzes the step of 
labeling. 

4. The method of claim 1 wherein the nucleotide is fluorescently labeled. 

5. The method of claim 1 wherein the nucleotide is radioactively labeled. 

6. The method of claim 1 wherein the nucleotide is enzymatically labeled. 

7. The method of claim 1 wherein the nucleotide is epitopically labeled. 

8. The method of claim 4 further comprising the step of: 
optically detecting fluorescent label on the solid support. 

9. The method of claim 8 wherein two primer pairs are employed, wherein the 
first primer of each of the first and second pairs of primers terminate at 
their 3' ends in distinct nucleotides, and wherein each 5' portion of each of 
said first primers is identical in sequence to all or part of a distinct probe at 
a known location on the solid support. 

10. The method of claim 8 wherein quantities of fluorescent label at known 
locations on the solid support are compared and a ratio of nucleotides at the 
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polymorphic locus in the sample is determined. 

1 1 . The method of claim 10 wherein the ratio of nucleotides at two or more 
polymorphic loci are determined simultaneously. 

12. The method of claim 1 wherein the sample comprises DNA from two or 
more individuals. 

13. The method of claim 1 wherein two or more regions of DNA, each of 
which comprises a polymorphic locus, are amplified in a single reaction 
mixture. 

14. The method of claim 1 wherein the solid support is beads. 

15. The method of claim 1 wherein the solid support is a microtiter dish. 

16. The method of claim 1 wherein the solid support is a high density array. 

17. A pair of primers which specifically amplify an allelic form of a 
polymorphic locus wherein a first primer comprises a 3' portion and a 5' 
portion, wherein the 3' portion is complementary to a region of DNA 
comprising the polymorphic locus and the 5' portion is identical in 
sequence to all or part of a probe on a solid support which is not 
complementary to the region of DNA, wherein the first primer terminates in 
a 3' nucleotide which is complementary to a selected allelic form of the 
polymorphic locus. 

18. A kit comprising in a single container two or more of the pairs of primers 
of claim 17. 

19. A kit comprising in a single container: 
a pair of primers of claim 17and 

a solid support comprising at least two probes which are attached to known 
locations on the solid support, wherein the probe comprises a sequence which is 
identical to the 5 f portion of the first primer. 

20. The kit of claim 19 wherein the solid support is beads. 

21. The kit of claim 19 wherein the solid support is a microtiter dish. 

22. The kit of claim 19 wherein the solid support is a high density array. 
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IDENTIFICATION OF ALLELES 

ABSTRACT OF THE DISCLOSURE 

A method for determining the genotype of one or more individuals at a 
polymorphic locus employs amplification of a region of DNA using primers 
containing tags and hybridization of the products to one or more probes on a solid 
support. The genotype or ratio of alleles is identified from the pattern of 
hybridization. The method can also be used to determine the frequency of 
different alleles in a population. 
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Fig. 1 
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